40 research outputs found

    Accuracy of inter-residue distance prediction for adenine-binding proteins from the SOIPPA dataset.

    No full text
    <p>The Pearson correlation coefficient (PCC) and the mean squared error (MSE) are calculated for the actual pairwise CΞ±-CΞ± distances upon the superposition of binding ligands and those predicted by SVR from residue-level scores. The accuracy is reported separately for different binding ligands and target protein conformations including crystal structures, high- and moderate-quality protein models.</p>a<p>Pearson correlation coefficient.</p>b<p>Mean squared error in Γ….</p><p>Accuracy of inter-residue distance prediction for adenine-binding proteins from the SOIPPA dataset.</p

    Performance of <i>e</i>MatchSite, PocketMatch and SiteEngine on the SOIPPA dataset of adenine-binding proteins.

    No full text
    <p>The accuracy of local alignment predictors is compared to that using global sequence and structure alignments for (<b>A</b>) crystal target structures, (<b>B</b>) high-, and (<b>C</b>) moderate-quality protein models. TPR and FPR are the true and false positive rates, respectively; gray area corresponds to a random prediction.</p

    Effects of target structure distortions on the quality of local alignments of ATP-binding sites.

    No full text
    <p>MCC is Matthew's correlation coefficient calculated against the reference alignments constructed using target crystal structures.</p

    Construction of sequence order-independent binding site alignments by <i>e</i>MatchSite.

    No full text
    <p>Two target proteins are ATP-dependent DNA ligase (PDB-ID: 1a0iA, yellow) and histamine N-methyltransferase (PDB-ID: 2aotA, red). Left (<b>A</b>–<b>D</b>) and right (<b>E</b>–<b>H</b>) panels show the alignment of binding sites in the crystal structures and protein models, respectively. (<b>A</b>, <b>E</b>) Matrices of pairwise CΞ±-CΞ± distances between two binding sites predicted by SVR. Residue indexes are shown in the first column and row. Sets of residue pairs that have the smallest CΞ±-CΞ± distances identified by the Kuhn-Munkres algorithm are highlighted in green. (<b>B</b>, <b>F</b>) Sequence order-independent alignments of two binding sites constructed from residue pairs that have the smallest CΞ±-CΞ± distances; arrows indicate equivalent pairs. (<b>C</b>, <b>G</b>) Protein structures are superposed according to the local alignment of their binding sites; binding residues and predicted pocket centers are shown as solid sticks and balls, respectively. (<b>D</b>, <b>H</b>) Relative orientation of binding ligands upon the local alignment of target binding sites; ATP in 1a0iA and S-adenosyl-L-homocysteine in 2aotA are shown as solid and transparent sticks, respectively.</p

    Prediction of aligned residue pairs using machine learning for SAH-binding proteins from the SOIPPA dataset.

    No full text
    <p>The correlation between the actual pairwise CΞ±-CΞ± distances upon the reference alignment of binding sites and those predicted by SVR is shown for (<b>A</b>) crystal structures, (<b>B</b>) high-, and (<b>C</b>) moderate-quality protein models, respectively. (<b>D</b>) The ROC plot for the prediction of equivalent residue pairs using SVC; CS – crystal structures, HQ – high-quality, MQ – moderate-quality models, R – random prediction.</p

    Global and local structure quality of adenine-binding proteins from the SOIPPA dataset.

    No full text
    a<p>Heavy-atom RMSD calculated over binding residues.</p>b<p>Distance between predicted pocket center and the geometric center of bound ligand.</p>c<p>Matthew's correlation coefficient for predicted binding residues.</p><p>High- and moderate-quality models are constructed by <i>e</i>Thread. Ligand binding sites and residues are detected by <i>e</i>FindSite.</p><p>Global and local structure quality of adenine-binding proteins from the SOIPPA dataset.</p

    Performance comparison for <i>e</i>MatchSite, PocketMatch, SiteEngine and sup-CK.

    No full text
    <p>Binding site matching is conducted using the (<b>A</b>–<b>C</b>) Kahraman and (<b>D</b>–<b>F</b>) Homogeneous datasets. The accuracy of local alignment predictors is compared to that using global sequence and structure alignments for (<b>A</b>, <b>D</b>) crystal target structures, (<b>B</b>, <b>E</b>) high-, and (<b>C</b>, <b>F</b>) moderate-quality protein models. TPR and FPR are the true and false positive rates, respectively; gray area corresponds to a random prediction.</p

    Comparison of sequence order-independent binding site alignments constructed by SiteEngine and <i>e</i>MatchSite for adenine-binding proteins from the SOIPPA dataset.

    No full text
    <p>The alignment accuracy is assessed by the average Β±standard deviation ligand heavy-atom RMSD calculated upon the superposition of aligned binding residues.</p>a<p>Ξ”<i>RMSD</i> is calculated by subtracting from RMSD a ligand heavy-atom root-mean-square deviation upon the superposition of two ligands.</p><p>Comparison of sequence order-independent binding site alignments constructed by SiteEngine and <i>e</i>MatchSite for adenine-binding proteins from the SOIPPA dataset.</p

    Stereochemical quality of protein models.

    No full text
    <p>Models constructed by <i>e</i>Thread/Modeller and <i>e</i>Thread/TASSER-Lite are compared to crystal structures as well as models built by a simple single-template approach, PSI-BLAST/Nest and a standard version of TASSER-Lite. The quality is assessed by the percentage of residues assigned to different regions of the Ramachandran map by PROCHECK.</p>a<p>According to PROCHECK classification: core – most favored regions, allow – additional allowed regions, gener – generously allowed regions, disall – disallowed regions.</p

    <em>e</em>Thread: A Highly Optimized Machine Learning-Based Approach to Meta-Threading and the Modeling of Protein Tertiary Structures

    Get PDF
    <div><p>Template-based modeling that employs various meta-threading techniques is currently the most accurate, and consequently the most commonly used, approach for protein structure prediction. Despite the evident progress in this field, accurate structure models cannot be constructed for a significant fraction of gene products, thus the development of new algorithms is required. Here, we describe the development, optimization and large-scale benchmarking of <em>e</em>Thread, a highly accurate meta-threading procedure for the identification of structural templates and the construction of corresponding target-to-template alignments. <em>e</em>Thread integrates ten state-of-the-art threading/fold recognition algorithms in a local environment and extensively uses various machine learning techniques to carry out fully automated template-based protein structure modeling. Tertiary structure prediction employs two protocols based on widely used modeling algorithms: Modeller and TASSER-Lite. As a part of <em>e</em>Thread, we also developed <em>e</em>Contact, which is a Bayesian classifier for the prediction of inter-residue contacts and <em>e</em>Rank, which effectively ranks generated multiple protein models and provides reliable confidence estimates as structure quality assessment. Excluding closely related templates from the modeling process, <em>e</em>Thread generates models, which are correct at the fold level, for >80% of the targets; 40–50% of the constructed models are of a very high quality, which would be considered accurate at the family level. Furthermore, in large-scale benchmarking, we compare the performance of <em>e</em>Thread to several alternative methods commonly used in protein structure prediction. Finally, we estimate the upper bound for this type of approach and discuss the directions towards further improvements.</p> </div
    corecore